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Abstract 


Using  software  components  to  develop  mission-critical  systems  poses  a  number  of  technical, 
organizational,  and  economic  challenges.  One  persistent  and  largely  unaddressed  challenge  is 
how  the  consumers  of  software  components — that  is,  the  developers  of  mission-critical  sys¬ 
tems — can  obtain  a  meaningful  level  of  trust  in  the  runtime  behavior  of  software  components. 
The  most  frequently  cited  concerns  are  centered  on  issues  of  security;  for  example,  trust  that  a 
component  does  not  contain  malicious  code  or  exhibit  vulnerabilities  that  can  be  exploited  by 
malicious  code.  There  are,  however,  other  concerns  about  software  component  behavior  that 
can  be  just  as  important.  For  example,  in  an  embedded  weapon  system,  it  may  be  crucial  to 
trust  that  a  component  will  always  execute  a  function  within  a  particular  time  bound  or  never 
introduce  unbounded  priority  inversion. 

Certification  is  a  practical,  proven  means  of  establishing  trust  in  various  sorts  of  things  in  other 
disciplines  and  is,  therefore,  a  natural  contender  for  developing  trust  in  software  components. 
This  technical  note  does  not  propose  a  particular  certification  regimen  for  components.  Rather, 
it  introduces  a  series  of  10  distinctions  that  can  help  in  understanding  different  aspects  of  certi¬ 
fication  in  the  context  of  software  components. 
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1  Introduction 


The  Predictable  Assembly  from  Certifiable  Components  (PACC)  Initiative  at  the  Carnegie 
Mellon®  1  Software  Engineering  Institute  (SEI)  is  investigating  how  the  behavior  of  systems 
comprising  assemblies  of  software  components  can  be  reliably  inferred,  or  predicted,  from  the 
trusted  properties  of  the  software  components  themselves. 

There  is  more  than  a  little  subtlety  in  the  phrasing  of  this  research  agenda.  In  particular,  we  use 
the  phrase  “trusted  properties  of  software  components”  rather  than  “trusted  software  compo¬ 
nents”  to  indicate  that  we  may  trust  some  properties  of  a  component  but  not  others.  We  also 
place  the  topic  of  component-level  trust  within  the  broader  context  of  system-level  (or,  more 
properly,  assembly-level)  predictability;  doing  so  indicates  the  intended  subservience  of 
trusted  components  to  an  encompassing  activity  involving  engineering  analysis. 

Another  subtlety  is  that  the  title  of  our  initiative  uses  the  adjective  “certifiable”  rather  than 
“certified.”  Our  interest  in  trusted  components  lies  in  how  they  are  used  to  make  reliable  pre¬ 
dictions  about  assembly  behavior.  Our  interest  in  certification  lies  in  how  it  can  be  used  to 
engender  trust.  Certification  is,  therefore,  at  least  one  step  removed  from  our  fundamental 
agenda — reliable  prediction.  On  the  other  hand,  the  conditions  that  make  a  component  certifi¬ 
able  are  of  direct  importance  to  our  agenda,  since  trustworthy  predictions  are  impossible  with¬ 
out  trustworthy  component  properties. 

Hermeneutics  aside,  it  is  apparent  that  our  perspective  (that  of  the  PACC  Initiative)  on  which 
aspects  of  software  components  are  worth  trusting — namely,  the  properties  of  components  that 
are  parameters  to  analytic  theories  of  assembly  behavior — is  not  wholly  conventional.  This 
perspective  may  unintentionally  put  us  at  odds  with  the  more  conventional,  and  familiar, 
notions  of  software  certification  used  in  the  software  industry.  The  resulting  terminological 
confusion  may  obscure  the  ideas  underlying  the  PACC  premise  regarding  trusted  compo¬ 
nents — ideas  whose  validity  does  not  hinge  on  certification  per  se. 

An  important  question  for  the  PACC  Initiative,  then,  is:  Can  the  conventional  understanding  of 
software  certification  be  adopted,  or  perhaps  coerced,  in  the  service  of  predictable  assembly? 
The  question  presupposes  that  there  is,  in  fact,  a  well-developed  conventional  notion  of  soft¬ 
ware  certification.  There  are,  in  fact,  many  notions  of  software  certification  currently  imple¬ 
mented  in  different  ways. 


1 .  Carnegie  Mellon  is  registered  in  the  U.S.  Patent  and  Trademark  Office  by  Carnegie  Mellon  University. 
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For  the  term  certifiable  components  to  have  technical  meaning  in  PACC,  it  will  have  to  be  sit¬ 
uated  in  a  much  clearer  model  of  software  certification  than  is  currently  available.  This  techni¬ 
cal  note  is  our  first  step  toward  exploring  the  software  certification  landscape  by  presenting  a 
space  of  alternative  concepts  rather  than  a  compendium  of  specific  approaches.  The  device 
used  to  describe  this  space  is  a  series  of  10  distinctions,  each  expressing  2  concepts  that  may 
be  similar  but  not  identical,  partially  conflicting,  or  mutually  exclusive  to  one  another. 

These  distinctions  do  not  completely  characterize  the  conceptual  space  of  software  certifica¬ 
tion.  They  are,  however,  representative  of  what  we  have  encountered  in  the  literature  and  at 
various  fora  on  trusted  software  and  software  certification.  The  distinctions  are  not  used  in  this 
technical  note  to  classify  different  approaches,  but  rather  to  expose  concerns  that  underlie  the 
topic  as  a  whole. 

As  background.  Chapter  2  defines  the  term  certification.  Chapter  3  briefly  discusses  the  10 
distinctions;  our  goal  in  that  chapter  is  to  help  the  reader  develop  general  intuition  rather  than 
to  present  an  argument  in  detail.  A  representative  but  by  no  means  comprehensive  review  of 
related  work  is  provided  in  Chapter  4.  Finally,  Chapter  5  presents  a  few  tentative  conclusions 
about  the  distinctions  discussed  in  this  technical  note. 
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2  Definition  of  Certification 


We  should  not,  at  the  outset,  prejudice  how  certification  may  be  applied  to  software  and,  in 
particular,  to  software  components.  The  dictionary  definition  below  gives  us  insight  into  the 
conventional  meaning  of  certification  and  exposes  what  may  be  a  source  of  many  hidden 
assumptions  about  how  it  does,  or  should,  relate  to  software  components: 

1.  to  attest  as  certain;  give  reliable  information  of;  confirm...  2.  to  testify  to  or 
vouch  for  in  writing.. .3.  to  guarantee;  endorse  reliably;  to  certify  a  document 
with  an  official  smL.[MWU  96] 

This  definition  is  a  marvel  of  efficiency,  yet  is  suggestive  in  several  ways: 

•  to  attest  as  certain  suggests  an  authoritative  testimonial  or  “standing  by”  a  statement 

•  (to  give)  reliable  information  suggests  an  objective  quality 

•  (to)  confirm  suggests  a  corroboration  of  other  information 

•  (to)  vouch  for  in  writing  suggests  evidence  and  legal  implications 

•  to  guarantee  suggests  contractual  obligations  and  remedies 

•  (to  use)  an  official  seal  suggests  a  designated  authority 

More  specialized  definitions  are  available;  for  example,  legal  and  medical  definitions  of  the 
term  have  emerged  over  time.  These  specializations  are  consistent  with  the  general  meaning 
but  are  particular  in  their  application.  For  our  purpose,  the  general  definition  is  a  sufficient 
starting  point. 
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3  Ten  Distinctions 


3.1  Truth  Vs.  Knowledge 

The  full  title  of  Mary  Shaw’s  position  paper  nearly  says  it  all:  “ Truth  Vi.  Knowledge:  The  Dif¬ 
ference  Between  What  a  Component  Does  and  What  We  Know  It  Does  [Shaw  96]. 

On  the  one  hand,  this  distinction  expresses  nothing  beyond  an  age-old  epistemological  ques¬ 
tion  about  the  basis  of  human  knowledge  and  its  relation  to  reality.  On  the  other  hand,  it  does 
posit  an  axiom  of  any  software  certification  regimen:  all  knowledge  about  the  properties  of  a 
software  component  will  be  provisional.  When  this  axiom  is  accepted,  any  robust  treatment  of 
certification  must  account  for  this  provisionality  and  possibly  make  it  explicit. 

We  are  accustomed  to  some  level  of  provisionality  for  software.  For  example,  performance 
benchmarks  of,  say,  operating  systems  involve  experimental  measurement  and,  hence,  intro¬ 
duce  potential  sources  of  systematic  and  random  error.  In  fact,  any  properties  of  software  com¬ 
ponents  that  are  measured  (for  example,  through  functional  testing)  will  introduce  sources  of 
experimental  error.  In  all  cases,  we  would  be  surprised  by,  and  should  be  skeptical  of,  any  pur¬ 
ported  measures  if  they  were  not  also  accompanied  by  some  sort  of  statistical  qualification. 

We  are  less  accustomed  to  the  idea  that  provisionality  applies  equally  to  properties  of  software 
components  established  by  purely  formal  means,  that  is,  through  means  of  logic  and  proof. 
What  does  it  mean  for  a  proof  of  total  correctness  in  Hoare  logic  to  be  provisional?  Just  this: 
The  proof  in  question  applies  to  a  specification  that  will  undergo  a  sequence  of  transforma¬ 
tions  (e.g.,  from  source  to  intermediate  to  assembly  to  machine  instructions),  each  of  which  is 
assumed  (usually  with  no  formal  or  empirical  basis)  to  be  correct. 

Although  this  last  point  is  a  well-known  bugaboo  noted  by  critics  of  formal  methods,  it  is  not 
our  main  concern.  Instead,  we  are  concerned  with  understanding  the  provisionality  of  formal 
assertions,  especially  since  a  recent  trend  in  formal  methods  is  the  use  of  fully  automated  veri¬ 
fication  technology,  such  as  temporal  logic  model  checking.  It  is  reasonable  to  assume  that 
software  implementing  complex  verification  procedures  is  also  be  susceptible  to  bugs.2 


2.  You  might  find  it  helpful  to  examine  the  release  history  of  the  well-known  Spin  model  checker  provided  at  http://spinroot.com. 
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3.2  Knowledge  Vs.  Trust 

'This  distinction  can  also  be  linked  to  basic  epistemological  questions  because  a  great  deal  of 
what  we  claim  to  know  about  the  world  is  acquired  not  through  direct  experience,  but  rather 
through  description.3  This  sense  of  the  distinction  is  not  what  is  emphasized  in  this  technical 
note. 

Meyer,  Mingins,  and  Schmidt  express  the  intended  meaning  of  this  distinction  in  their  article 
titled  “ Trusted  Components  for  the  Software  Industry"  [Meyer  98].  In  particular,  they  argue 
(correctly  in  my  judgement)  that  trust  is  a  social  phenomena.  One  (revealing)  supporting  argu¬ 
ment  is  that  even  the  acceptance  (“trust”)  of  mathematical  proof  is  influenced  by  social  factors 
such  as  implicit  agreement  on  what  is  an  acceptable  level  of  rigor  in  proofs,  the  perceived  rep¬ 
utation  of  the  author  of  the  proof,  and  so  forth. 

While  this  social  conditioning  has  been  noted  elsewhere  [Demillo  77],  Meyer,  Mingins,  and 
Schmidt  draw  software-specific  implications  related  to  the  question  of  certification.  Specifi¬ 
cally,  they  conclude  that  a  broad  and  varied  array  of  evidence  is  required  to  establish  a  social 
foundation  for  trust.  From  that  conclusion,  we  can  infer  that  certification,  in  the  sense  of 
authoritative  labeling,  may  sometimes  be  insufficient  in  itself  to  establish  trust;  conversely,  it 
might  be  necessary  in  certain  communities  accustomed  to  the  stricture  of  certification;  for 
example,  the  nuclear  power  industry. 

Note  that  trust  is  not  absolute — absolute  trust  is  better  denoted  as  faith.  The  provisional  nature 
of  trust  is  also  noted  by  Meyer,  Mingins,  and  Schmidt.  As  a  result,  we  can  also  conclude,  as 
did  Mary  Shaw,  that  a  robust  treatment  of  certification  will  address  the  inherent  boundedness 
of  the  certainty  of  empirical,  and  even  formal,  evidence. 


3.3  Normative  Vs.  Descriptive 

Normative  certification  attests  that  software  conforms  to,  or  satisfies,  some  established  norm. 
Descriptive  certification  merely  “describes”  some  aspect  of  the  software. 

Examples  of  normative  certification  are  legion  and  include  the  following:  conformance  to 
interface  standards,  for  example,  POSIX;4  test  suites,  for  example,  the  Ada  Compiler  Valida¬ 
tion  Capability  (ACVC)  benchmarks;5  or  quality-attribute-specific  norms,  for  example,  for 


3.  A  classic  example  of  this  is:  I  know  the  Taj  Mahal  exists,  and  yet  I  have  never  seen  it  (except,  again,  provisionally  through 
photographs).  Is  this  knowledge  trustworthy?  How  do  you  decide? 

4.  For  more  information,  go  to  http://www.opengroup.org/testing/fips/. 

5.  For  more  information,  go  to  http://www.iste.uni-stuttgart.de/ps/AdaBasis/pal_1 1 95/ada/a]po/compilers/95val/val-proc.txt. 
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security,  the  storied  “Orange  Book,”*3  or,  for  safety,  a  menagerie  of  standards,  including 
UL1998,  IEEE  1228,  IEC  60880,  IEC  60950,  and  IEC  61508. 

Examples  of  descriptive  certification  are,  by  comparison,  rare.  Performance  benchmarks  have 
been  defined  for  particular  kinds  of  software;  for  example,  middleware  and  database  technol¬ 
ogy7  Voas  has  long  advocated  the  descripti  ve  certification  of  reliability  and  has  proposed  sev¬ 
eral  mechanisms  for  acquiring  such  measures  [Voas  98],  [Voas  99],  [Voas  00].  His  proposals 
emphasize  the  descriptive  rather  than  normative  aspects  of  certification  because  no  threshold 
(norm)  that  components  must  satisfy  has  been  defined. 

Normative  certification  is  the  rule  in  software;  descriptive  certification  is  nearly  nonexistent. 
This  situation  is  no  doubt  due,  in  part,  to  the  dearth  of  accepted  measures  for  software — with 
the  exception  of  “source  line  of  code”  and  “function  point”— and  even  these  are  disputed.  In 
contrast,  electronic  (hardware)  components  have  numerous  descriptive  measures  such  as 
whetstones,  dhrystones,  and  clock  speed  for  processors;  mean  time  to  failure;  access  time  and 
seek  time  for  disk  drives;  and  so  forth.  Generally,  these  measures  are  used  to  provide  manufac¬ 
turer-rated  specifications. 

In  general,  we  can  conclude  that  certification  tends  toward  normative  interpretation,  partly 
because  the  notions  of  authority  and  the  legal  and  economic  sanctions  implied  by  the  general 
definition  of  certification  (see  Chapter  2)  favor  notions  of  compliance  to  established  norms. 
Still,  where  standard  measures  of  component  properties  are  available — as  for  certain  classes  of 
hardware  components — some  level  of  descriptive  certification  can  take  shape  simply  to  allow 
manufacturers  to  differentiate  their  products  in  the  marketplace. 

The  following  argument  is  worth  considering:  normative  certification  is  really  descriptive  cer¬ 
tification  with  associated  acceptance  criteria  or,  more  generally,  a  classification  threshold  for 
acceptance.  If  this  argument  is  valid,  descriptive  certification  would  appear  to  be  of  indepen¬ 
dent  significance  since  it  permits  the  definition  of  various  norms  rather  than  a  single  one  size 
fits  all”  norm. 


3.4  Objective  Vs.  Subjective 

Objective  measures  are  those  that  depend  only  on  the  object  of  study  and  possibly  some  phys¬ 
ical  apparatus.  In  contrast,  subjective  measures  are  the  product  of  some  mental  activity.  While 
this  distinction  is  clear,  an  illustration  can  help  explain  some  less  than  obvious  subsidiary 
points. 


6.  For  more  information,  go  to  http://www.radium.ncsc.mil/tpep/library/rainbow/5200.28-STD.html. 

7.  For  more  information  on  benchmarking  transaction  processing  systems,  go  to  http://www.tpc.org/.  To  see  various  papers  on 
benchmarking  middleware  system,  go  to  http://nenya.ms.mff.cuni.cz/projects/corba/oopsla-workshop-03/. 
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Consider,  for  example,  measures  of  a  quality  attribute  that  we  might  call  the  understandability 
of  a  computer  program.  A  candidate  objective  measure  of  this  quality  is  “cyclomatic  complex¬ 
ity,”  which  is  defined  as  a  numerical  measure  derived  from  the  structure  of  a  computer  pro¬ 
gram  (in  particular,  control  flow).8  A  candidate  subjective  measure  might  be  a  statement  posed 
to  a  human  subject,  such  as  “this  code  is  understandable,”  with  a  numerical  measure  derived 
from  responses  to  this  statement  ranging  from  “strongly  agree”  to  “strongly  disagree.” 

The  virtue  of  objectivity  is  that  the  measure  is  repeatable.  But  repeatability  does  not  suggest 
anything  about  other  important  qualities  of  the  measure  such  as  reliability.  It  is  not  clear 
whether  cyclomatic  complexity  is  more  reliable  than  the  proposed  subjective  measure,  and,  in 
fact,  there  is  reason  to  think  just  the  opposite. 

Note,  too,  that  quantification  is  not  the  sole  purview  of  objectivity;  in  practice,  subjective  mea¬ 
sures  are  quite  likely  to  be  quantified.  A  significant  body  of  literature  on  multi-attribute  deci¬ 
sion  making  depends  on  quantified  judgement.  Also,  Fenton  and  Pleeger  provide  a  nice 
discussion  of  the  coevolution  of  measures,  from  qualitative  to  quantitative  in  tandem  with  an 
improved  understanding  of  the  phenomenon  being  measured  [Fenton  97].  Their  thesis  sug¬ 
gests  that  objective  measures  of  software  will  become  increasingly  prominent,  since  qualita¬ 
tive  measures  (which  are  supplanted  as  understanding  improves)  are  inherently  subjective. 


3.5  Product  Vs.  Process 

In  this  report,  we  have  tacitly  assumed  that  the  subject  of  certification  is  the  software  product 
itself.  Yet,  normative  certification  standards  for  software  tend  to  place  significant  importance 
on  the  development  processes  used  to  create  the  software.  This  again  may  reflect  the  dearth  of 
reliable  (objective  or  subjective)  measures  of  software  quality.  An  emphasis  on  process  certifi¬ 
cation  illustrates  the  use  of  an  indirect,  or  proxy,  measure  of  software  quality. 

Product  certification  is  likely  to  be  most  useful  where  conformance  testing  can  be  fully  mech¬ 
anized,  such  as  the  earlier  examples  of  certifying  interface  compliance  (the  POSIX  example) 
and  test  suite  compliance  (the  AC  VC  example).  Product  certification  has  the  merit  of  empha¬ 
sizing  objective  measures,  which,  as  previously  noted,  improves  repeatability.  We  have  a 
strong  intuition  (not  substantiated  by  measure  theory  or  current  practice)  that  product  certifica¬ 
tion  will  yield  more  reliable  measures  of  software  quality  than  process  certification,  despite 
the  known  correlation  between  better  software  process  and  better  quality  software. 

Process  certification  is  more  useful  when  expert  judgement  is  required  and  the  cost  of  formu¬ 
lating  expert  judgements  from  product  artifacts  is  high.  For  example,  certifying  that  software 
is  safe  in  so-called  human-rated  systems  requires  detailed  examination  of  design  artifacts  (for 


8.  For  more  details  on  this  measure,  go  to  http://www.sei.cmu.edu/str/descriptions/cyclomatic_body.htmt. 
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example,  requirements  specifications,  architectural  specifications,  and  source  code).  A  less 
costly  but  possibly  reliable  measure  may  be  the  quality  of  the  software  engineering  processes 
used  to  manage  requirements,  architectural  design,  and  implementation. 


3.6  Local  Vs.  Contextual 

Local  properties  of  software  are  those  that  can  be  ascertained  independently  of  their  use  con¬ 
text.  Contextual  properties,  in  contrast,  are  just  the  opposite:  they  require  a  context.9  A  local 
property  of  software  might  be  its  size  in  source  lines  of  code — which  is  an  illustrative,  if  not 
generally  useful,  measure.  A  contextual  measure  of  the  same  software  might  be  its  reliability, 
since  reliability  will,  in  most  cases,  depend  entirely  on  how  the  software  is  used — that  is,  on 
what  input  it  receives  from  its  environment  (its  context). 

This  dichotomy  is  of  direct  relevance  to  the  question  of  certifiable  software  components,  since 
the  properties  of  software  that  are  of  most  interest  tend  to  be  contextual.  In  other  words,  the 
properties  of  interest  tend  to  be  those  exhibited  by  systems  as  a  whole  rather  than  by  any  one 
part  of  a  system.  Put  in  yet  another  (but  more  systems-theoretic)  way,  the  properties  of  interest, 
such  as  safety,  tend  to  be  emergent  properties — those  that  arise  as  a  result  of  the  interactions 
among  the  parts  of  a  system.  (For  a  discussion  of  emergence,  see  the  monograph  on  safety 
engineering  by  Leveson  and  associates  [Leveson  04]). 

This  notion  of  emergence  has  some  rather  obvious  implications  for  the  prospects  of  certifying 
software  components.  Certifying  that  a  software  component  is  safe  requires  the  certification  to 
be  “contextualized”  in  at  least  two  ways.  First,  the  definition  of  what  it  means  for  a  component 
to  behave  safely  will  vary  from  context  to  context.  Second,  the  certified  component  property 
must  be  correlated  in  some  way  to  the  emergent  behavior,  since  that,  rather  than  the  compo¬ 
nent  behavior,  is  what  is  of  interest. 


3.7  Determinate  Vs.  Predictive 

We  say  that  certification  is  determinate  if  norms  can  be  established  for  local  properties  of  com¬ 
ponents  and  predictive  if  otherwise.10 


9.  Fenton  and  Pleeger  [Fenton  97]  refer  to  this  distinction  as  internal  (for  local)  and  externa!  (for  contextual)  measures,  but  the 
terminology  adopted  in  this  technical  note  is  more  transparent.  Also,  the  antonym  of  local  is  global,  which  might  better  rep¬ 
resent  the  distinction.  However,  global  suggests  an  absolute  notion,  whereas  contextual  suggests  a  more  relative  notion, 
that  is,  the  context  for  one  software  element  may  simultaneously  be  an  element  in  another  (its  own)  enclosing  context.  For 
this  reason,  I  prefer  the  term  contextual. 

1 0.  Of  the  distinctions  presented,  this  is  the  only  one  that  directly  reflects  our  research  bias.  The  other  distinctions  reveal  some¬ 
thing  of  the  state  of  the  practice  of  software  certification,  while  this  one  reveals  something  of  our  proposed  improvement  of 
this  practice.  This  distinction  is  also  the  most  difficult  to  describe;  more  work  is  needed  to  understand  the  underlying  issue. 
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To  illustrate  predictive  certification,  consider  the  contextual  property  “the  system  satisfies  all 
performance  deadlines.”  Also  imagine  that  our  interest  is  in  certifying  the  performance  proper¬ 
ties  of  the  system’s  software  components.  Does  the  software  component  have  any  purely  local 
properties  that  can  be  certified  as  conforming  to  this  contextual  property?  Probably  not.  We 
might  measure  the  non-blocking  execution  time  of  all  component  traces  (non-blocking  to 
eliminate  non-local  effects).  The  validity  of  this  measure  assumes  that  traces  are  not  dependent 
on  environmental  input — a  situation  that  is  unlikely.  We  are  also  excluding  the  role  of  proces¬ 
sor,  memory,  and  so  on  as  contextual  elements  of  such  benchmarks. 

Although  the  measure  of  execution  time  is  arguably  not  contextual,  it  is  clear  that  this  particu¬ 
lar  measure  is  insufficient  for  determining  whether  the  real  contextual  requirement — meeting 
deadlines — is  satisfied.  For  this  purpose,  we  need  a  sufficiently  detailed  model  of  the  context, 
one  into  which  we  can  “plug”  the  component  execution  time  to  determine  (predict)  deadline 
satisfaction.  Even  if  the  context  were  defined  well  enough  for  execution  time  norms  to  be 
established  for  components,  the  effect  is  still  predictive  rather  than  determinate.  That  is  true 
because,  in  any  case,  the  component  execution  time  is  insufficient  in  itself  to  fulfill  the  contex¬ 
tual  property  of  deadline  satisfaction. 

To  illustrate  determinate  certification,  consider  first  the  rather  simple  properties  of  conform¬ 
ance  to  interface  specifications  or  the  satisfaction  of  specified  test  cases  (again,  the  POSIX  and 
ACVC  examples).  These  properties  are  determinate  by  definition.  A  less  trivial  illustration  is 
to  certify  the  total  correctness  of  a  software  component — that  is,  that  it  always  computes  the 
correct  result  and  terminates  for  input  within  specified  bounds.  Here,  the  contextual  assump¬ 
tions  are  finessed,  but  not  eliminated,  by  circumscribing  the  behavior  of  the  context  (the  envi¬ 
ronment).  The  result  is  still  determinate,  however,  because  the  proof  of  correctness  requires 
only  these  assumptions. 

If  this  last  illustration  is  modified,  however,  from  certifying  component-level  correctness  to 
certifying  system-level  safety,  matters  become  more  obscure.  We  may  now  require  (as  we  did 
in  the  performance  illustration)  a  model  of  the  context  that  is  detailed  enough  to  expose  those 
patterns  of  interaction  among  components  from  which  safety  will  emerge.  Now  the  proof  of 
component  correctness  is  just  a  parameter  of  a  larger  proof  of  safety.  Further  contextual 
assumptions  may  need  to  be  more  detailed;  for  example,  those  in  the  “assume-guarantee” 
clauses  that  are  often  required  to  construct  a  compositional  proof  of  correctness  of  concurrent 
software.  Certification  may  still  be  determinate  in  this  case,  but  the  boundaries  are  becoming 
blurred. 

To  recap,  the  performance  properties  of  components  can  be  certified  only  predictively,  since 
these  properties  are  inherently  contextual.  In  contrast,  properties  that  are  inherently  local  to 
components  can  be  determinately  certified.  A  grey  area  arises  when  the  properties  are  local  to 
components,  but  the  norms  are  contextual.  If  we  accept  Fenton  and  Pleeger’s  thesis  that  con¬ 
textual  properties  are  the  main  ones  of  interest,  perhaps  we  should  also  accept  that  predictive 
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certification  is  likely  to  be  more  interesting  than  determinate  certification  for  software  compo¬ 
nents. 


3.8  Formal  Vs.  Empirical 

This  distinction  differentiates  those  software  component  properties  established  through  proofs 
in  some  system  of  logic  (formal  properties)  from  those  established  through  observation  and 
measurement  (empirical  properties). 

It  is  natural  to  think  of  formal  techniques  as  constituting  a  gold  standard  for  the  software  prop¬ 
erties  dependent  exclusively  on  software  (i.e.,  those  not  dependent  on  the  peculiarities  of  a 
particular  computing  environment  such  as  memory  and  processor  speed)  and  to  consider 
empirical  properties  as  a  compromise  at  best.  The  second-class  status  of  empirically  derived 
evidence  is  certainly  implied  by  Meyer,  who  assigns  formal  proof  the  high-road  status,  while 
relegating  non-proof-theoretic  approaches  the  low-road  status  [Meyer  03].  It  is  difficult  to  dis¬ 
agree  with  this  assessment,  especially  if  a  computer  program  is  accorded  its  true  stature  as  a 
formal  specification. 

In  some  cases,  though,  we  must  rely  on  measurement  for  those  properties  dependent  on  the 
physical  computing  environment;  for  example,  the  consumption  of  time  and  power  and  the 
generation  of  heat — all  of  which  may  depend  on  the  number  of  instructions  needed  to  compute 
a  particular  function,  on  the  mix  of  instructions  used,  and  on  the  characteristics  of  hardware 
(which  often  vary  from  batch  to  batch  in  a  manufacturing  process). 

Measurement  may  also  be  useful,  however,  when  applied  on  a  scale  that  would  be  impractical 
for  formal  techniques.  There  are  analogues  in  the  physical  sciences.  For  example,  the  ideal  gas 
laws  are,  in  principle,  an  approximation  of  forces  that  could  be  computed  by  applying  the  laws 
of  motion,  assuming  inelastic  collision,  on  the  scale  of  molecular  interaction  [Fermi  56].  There 
is  no  need  to  do  so,  however,  since  the  law  of  large  numbers  assures  us  that  the  ideal  gas  laws 
adequately  describe  the  emergent  properties  of  temperature  and  pressure  at  a  particular  level  of 
system  description. 

This  ideal  gas  law  example  may  seem  contrived  and  far  removed  from  software,  but  software 
analogues  are  not  hard  to  find.  For  example,  automata-theoretic  models  of  program  execution 
provide  a  sound  foundation  for  microscopic  theories  of  program  behavior,  and  those  models 
are  exploitable  by  formal  verification  techniques  such  as  temporal  logic  model  checking. 
These  techniques  suffer  from  state  space  explosion,  however,  for  even  moderately  complex 
computer  programs,  to  say  nothing  of  large-scale  systems  comprising  many  hundreds  of  such 
programs.  At  these  levels  of  system  scale,  various  forms  of  statistical  quality  control  and  sta¬ 
tistical  testing  have  proven  more  practical  and  effective. 
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Probability  theories  that  provide  meaningful  models  of  the  behavior  of  software  systems  with 
large  and  potentially  unbounded  state  space  have  been  applied  also  in  the  new  discipline  of 
real-time  queuing  theory  (RTQT)  [Lehoczky  97]  and,  quite  recently,  in  survivability  models  of 
software  systems  [Jha  01].  These  techniques  are  favored  because  they  provide  more  conve¬ 
nient  and  tractable  analysis  of  probabilistic  behavior,  while  still  exhibiting  the  soundness  of  an 
underlying  mathematical  theory.  And,  as  noted  earlier,  such  theories  can  provide  a  basis  for 
predictive  certification. 

Formal  and  empirical  may  be  described  here  as  extremes  of  a  dichotomy,  but  they  are  not 
mutually  exclusive.  For  example,  model  checking  and  testing  have  been  combined  by  using 
empirical  testing  to  verify  that  a  developed  system  is  indeed  a  refinement  of  an  abstract  model, 
where  that  model  is  subjected  to  formal  verification  [Havelund  01]. 


3.9  Procedural  Vs.  Mechanical 

Procedural  certification  mandates  a  role  for  some  authorized,  human,  certifying  agent.  In  other 
words,  procedural  certification  involves  a  “human  in  the  loop.”  In  contrast,  mechanical  certifi¬ 
cation  can,  in  principle  at  least,  be  fully  automated.11 

Normative  certification  standards  currently  in  use  in  the  software  industry  (see,  for  example, 
the  safety  standards  mentioned  earlier)  are  procedural.  It  is  hardly  possible  to  conceive  that 
compliance  with  these  standards  can  be  reduced  to  mechanical  checking.  Moreover,  these 
standards  have  spawned  a  support  industry  of  independent  verification  and  validation  (IV&V) 
laboratories  that  provide  testing  services  for  standards  compliance.  This  business  model  serves 
industry  well,  since  IV&V  allows  for  a  specialization  of  skills,  provides  a  means  of  reducing 
the  liability  exposure  of  system  producers,  and  helps  generate  social  trust. 

Mechanical  certification  has  appeared  in  the  commercial  marketplace.  A  notable  example  is 
Microsoft’s  approach  to  certifying  device  drivers.  Drivers  are  subjected  to  a  battery  of  compli¬ 
ance  tests,  including  the  use  of  a  fully  automated  suite  of  model  checking  tools  that  formally 
verify  a  specified  set  of  safety 12  conditions  (a  “safety  policy”).  The  logic  of  a  mechanical 
approach  seems  well  suited  to  Microsoft’s  business  need  to  establish  social  trust  in  its  products 
without  requiring  a  large  investment  in  a  “human  in  the  loop”  certification  process. 

Intriguing  possibilities  for  mechanical  certification  are  also  emerging  in  research  in  proof-car¬ 
rying  code  (PCC),  sometimes  referred  to  as  self-certifying  code.[Necula  96].  In  one  variant  of 


11.  The  formulation  of  mechanical  versus  procedural  is  preferred  to  automated  versus  manual  because  some  processes  that 
are  automatable  in  principle  may,  for  practical  reasons,  make  use  of  manual  processes— the  running  of  test  programs,  for 
example.  The  formulation  chosen  in  this  technical  note  reflects  the  inherently  procedural  nature  of  an  activity. 

1 2.  Note  that  this  use  of  the  term  safety  refers  to  a  particular  class  of  properties  that  can  be  established  by  examining  the  prefix 
of  finite  execution  traces.  This  is  not  the  same  notion  of  safety  used  by  the  various  safety  standards  mentioned  earlier . 
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this  concept,  a  consumer  publishes  a  safety  policy,1"  and  certifying  compilers  insert  a  proof  of 
this  safety  policy’s  satisfaction  directly  into  component  code.  The  target  system  in  which  the 
component  will  be  placed  executes  an  efficient  check  of  the  proof.  One  interesting  characteris¬ 
tic  of  this  approach  is  that  fraudulent  proofs  are  definitely  rejected,  while  adulterated  code  (for 
example,  code  that  has  been  tampered  with  or  otherwise  compromised)  will  be  accepted  only 
if  it  satisfies  the  published  safety  policy. 

PCC  is  an  approach  to  generate  trust  in  components  even  if  component  suppliers  and  distribu¬ 
tion  channels  (e.g.,  for  mobile  code)  are  untrusted.  Research  in  so-called  “foundational  PCC” 
takes  this  approach  to  the  extreme  by  systematically  reducing  the  “footprint”  of  the  trusted 
computing  base  required  for  PCC  [Appel  01].  For  example,  in  Necula  and  Lee  s  original  work 
[Necula  96],  the  proof  checker  must  be  a  trusted  component.  Foundational  PCC  allows  this 
component,  and  various  others  of  the  original  PCC  work,  to  be  untrusted,  achieving  what  Lee 
has  referred  to  euphemistically  as  “a  conjurer’s  trick”13  of  generating  trust  from  a  trustless 
environment. 


3.10  Standard  Vs.  De  Facto 

One  assertion  made  in  this  technical  note  (and  elsewhere)  is  that  certification  is  a  means  to  an 
end— it  is  a  tactic  that  can  be  used  to  establish  trust  and,  possibly,  reduce  liability.  Trust  and 
reduced  liability  are  economic  and  social  ends  that  transcend  the  objects  of  certification  par¬ 
ticular  systems  or  components  of  systems.  Accordingly,  in  many  cases,  there  are  significant 
economic  and  social  interests  at  stake  in  how  certification  norms  are  defined  and  compliance 
is  enforced  (sanctions,  etc.).  The  established  fora  for  reconciling  these  interests  are  the  recog¬ 
nized  standards  bodies— the  Institute  of  Electrical  and  Electronics  Engineers  (IEEE),  the  Inter¬ 
national  Electrotechnical  Commission  (DEC),  the  American  National  Standards  Institute 
(ANSI),  the  International  Standards  Organization  (ISO),  the  Federal  Information  Processing 
Standards  (FIPS),  and  so  forth. 

As  we  saw  with  the  example  of  Microsoft  device  drivers,  however,  not  all  standards  must  be 
established  by  recognized  standards  bodies.  Other  standards  may  be  imposed  by  particular 
businesses  (for  example,  Microsoft),  by  standards  bodies  that  are,  in  effect,  proxies  for  one  or 
more  businesses  (for  example,  the  Object  Management  Group),  or  for  particular  business  seg¬ 
ments  (for  example,  automotive  and  workflow  management).  Such  standards  are  often  called 
(sometimes  pejoratively)  de  facto  standards.  Those  standards  tend  to  emphasize  concerns  such 
as  branding  (for  example,  “Microsoft  Certified”  or  “100%  Java”). 

The  distinction  between  standard  and  de  facto  certification  may  not  seem,  on  the  surface,  to  be 
significant.  Branding  is,  after  all,  just  another  means  of  establishing  trust — in  this  case, 

13.  This  is  an  overstatement,  but  just  barely.  One  has  to  trust  something,  but  if  foundational  PCC  is  successful,  all  that  must  be 
trusted  are  the  foundations  of  formal  logic. 
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through  product  uniformity.  On  the  other  hand,  there  is  no  question  that  the  approaches  used  to 
establish  sanctioned  and  de  facto  standards  are  radically  different.  Intuitively,  deciding  which 
route  to  take  for  certified  software  components  seems  to  be  important  in  defining  a  starting 
point  for  both  the  effort  (e.g.,  a  particular  safety  standard)  and  the  strategy  used  to  achieve 
consensus. 
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4  Related  Work 


Meyer  has  long  been  an  eloquent  proponent  of  many  aspects  of  software  quality  and  has,  most 
recently,  been  associated  with  the  notion  of  trusted  components.  He  is  the  creator  of  design  by 
contract  (DBC),  a  technique  that  embraces  formal  methods  and  testing  to  improve  the  pro¬ 
gramming  process  and  the  quality  of  the  resulting  programs.  Features  to  support  DBC  are 
incorporated  directly  into  Meyer’s  Eiffel  programming  language,  although  DBC  features  have 
appeared  in  other  languages  as  well  [Amout  02],  [Kramer  98]. 

Meyer’s  most  recent  work  on  the  subject  of  trusted  components  is  geared  toward  establishing  a 
component  certification  center.  The  center  has  two  metaphorical  roads:  (1)  the  high  road, 
based  on  proofs  of  component  correctness  and  (2)  the  low  road,  based  on  a  rudimentary  but 
usable  component  quality  model  [Meyer  03].  It  is  not  clear  how  much  progress  has  been  made 
toward  the  goal  of  the  certification  center. 

Voas  has  been  an  outspoken  advocate  of  software  warranties  and  software  certification.  Much, 
but  not  all,  of  Voas’s  work  has  been  directed  to  the  use  of  commercial  off-the-shelf  software  in 
settings  requiring  a  high  level  of  quality  assurance.  Voas’s  notion  of  a  software  certification 
laboratory  [Voas  00]  has  a  similar  intent  to  Meyer’s  certification  center,  but  is  more  distributed 
in  operation  (at  least  in  concept,  since  no  such  laboratory  has  been  developed). 

Microsoft  Corporation  has  several  active  areas  of  investigation  and  product  development  that 
have  some  bearing  on  the  topic  of  software  certification.  The  SLAM  toolkit  [Ball  01]  imple¬ 
ments  a  formal  verification  technique  used  to  verify  device  drivers  against  a  prespecified  set  of 
safety  conditions  (for  a  definition  of  safety,  see  footnote  12  on  page  11).  Various  efforts  are 
based  on  Gurevich’s  Abstract  State  Machine  Language  (AsmL)  [Gurevich  04];  for  example, 
the  use  of  monitors  and  “spying”  for  runtime  contract  verification  [Barnett  01].  For  example, 
intriguing  work  reported  by  Barnett  and  Schulte  uses  AsmL,  runtime  monitors,  and  features  of 
the  COM  component  technology  to  perform  runtime  conformance  checking  between  a  compo¬ 
nent  implementation  and  a  formal  specification  of  its  behavior.  This  is  another  example  of 
combining  formal  and  empirical  testing  techniques  (the  other,  already  cited,  is  Havelund  and 
Rosu’s  work  [Havelund  01]). 

The  National  Infrastructure  Assurance  Partnership  (NIAP)14  is  a  U.S.  government  initiative, 
initially  conceived  as  a  collaborative  effort  of  the  U.S.  National  Institute  for  Science  and  Tech- 


14.  For  more  information  on  the  NIAP,  go  to  http://niap.nist.gov/index.html. 
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nology  (NIST)  and  the  U.S.  National  Security  Agency  (NSA).  The  goal  of  the  NIAP  is  “to 
help  increase  the  level  of  trust  consumers  have  in  their  information  systems  and  networks 
through  the  use  of  cost-effective  security  testing,  evaluation,  and  validation  programs.”  The 
NIAP  represents  a  traditional,  normative  approach  to  software  certification. 

NIAP  certification  norms  are  referred  to  as  the  common  criteria  and  protection  profiles,  which 
are  product-category-specific  “slices”  through  the  common  criteria  (e.g.,  “firewall”  and  “oper¬ 
ating  system”).  The  common  criteria  are  based  on  the  NSA’s  “Orange  Book”  security  norms 
but  have  been  updated  to  reflect  the  emergence  of  a  robust  marketplace  in  commercial  soft¬ 
ware  technology.  To  date,  over  120  products  have  been  validated  as  compliant  with  some 
aspect  of  the  common  criteria,15  and  the  NIAP  has  established  8  separate  testing  laboratories. 

Underwriter’s  Laboratory  (UL)  has  developed  the  ANSI/UL1998  standard  for  the  safety  of 
software  in  programmable  components.  Like  the  NIAP,  UL1998  is  normative.  Unlike  the 
NIAP,  however,  UL1998  places  a  much  stronger  emphasis  on  process  norms,  which  is  consis¬ 
tent  with  the  system-level  safety  certification  standards  discussed  earlier.  UL1998  also  shares 
with  the  NIAP  its  specialization  to  the  concerns  of  software,  focusing  not  on  the  system  as  a 
whole  (the  programmable  component  in  UL 1998-speak),  but  rather  on  its  software  element. 


1 5.  For  a  list  of  those  products,  go  to  http://niap.nist.gov/cc-scheme/vpl/vpl_name.html. 
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5  Conclusions 


Based  on  the  discussions  in  this  technical  note,  some  preliminary  observations  about  certifica¬ 
tion  in  practice  are  that  it 

•  emphasizes  process  more  than  product,  subjective  rather  than  objective,  normative  rather 
than  descriptive,  determinate  rather  than  predictive,  and  procedural  rather  than  mechanical 
approaches 

•  involves  a  recognized,  possibly  designated  certifying  authority;  the  certifying  authority 
need  not  be,  and  often  is  not,  the  same  as  the  testing  agent. 

•  does  not  accommodate  provisionality,  at  least  in  part  because  the  emphasis  on  process  and 
subjective  measures  presents  significant  challenges  to  quantifying  uncertainty 

What  do  these  observations  mean  for  the  PACC  agenda?  Compared  with  the  first  bulleted 
observation  above,  PACC  emphasizes  product  rather  than  process,  objective  rather  than  sub¬ 
jective,  descriptive  rather  than  prescriptive,  predictive  rather  than  determinate,  and  mechanical 
rather  than  procedural  approaches  to  component  trust.  While  the  PACC  notion  of  a  certifiable 
component  is  clearly  quite  different  from  current  practice,  it  is  not  unprecedented,  as  evi¬ 
denced  by  the  Microsoft  device  driver  certification  program.  Since  the  Microsoft  program  is 
driven  by  clear-eyed  business  considerations,  its  contrary  emphasis  (like  that  in  PACC)  should 
be  taken  seriously. 

The  PACC  concept  of  certifiable  component  is  likely  to  be  best  received  when  the  certifying 
authority  is  motivated  primarily  by  bottom-line  considerations;  for  example,  manufacturers  of 
software  or  devices  that  have  tight  design  tolerance  and  are  constructed  from  or  extended  by 
third-party  software  components.  In  contrast,  the  PACC  concept  of  certifiable  components  will 
be  difficult  to  transition  in  two  situations:  (1)  when  the  certifying  authority  is  part  of  an  official 
regulatory  process,  since,  in  that  setting,  the  bottom  line  is  but  one  of  many  large-scale  social 
interests  at  stake  and  (2)  where  the  regulators  must  be  wary  of  being  overly  prescriptive  and 
imposing  competitive  or  other  economic  hardship  on  manufacturers  of  software  and  software¬ 
intensive  systems. 
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